{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# KNN\n",
    "## Algumas execuções usando scikit-learn\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Vamos utilizar a implementação da biblioteca K-Nearest Neighbors para descobrir qual o melhor K para o nosso dataset.\n",
    "\n",
    "O dataset utilizado foi obtido de http://archive.ics.uci.edu/ml/datasets/Haberman's+Survival e apresenta dados sobre a sobrevivência de pacientes que passaram uma cirurgia de câncer de mama.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "import numpy as np\n",
    "import math\n",
    "\n",
    "data = np.loadtxt(\"haberman.data\",delimiter=\",\")\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Agora vamos separar os conjuntos de treino e teste"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ndata = np.random.permutation(data)\n",
    "\n",
    "size = len(ndata)\n",
    "nt = int(math.floor(size*0.7))\n",
    "trfeatures = ndata[0:nt,0:3]\n",
    "ttfeatures = ndata[nt:size,0:3]\n",
    "trlabels = ndata[0:nt,3]\n",
    "ttlabels = ndata[nt:size,3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Com os nossos dados separados podemos criar um classificador e treiná-lo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "knn3 = KNeighborsClassifier(n_neighbors=3)\n",
    "knn3.fit(trfeatures, trlabels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Após isso podemos fazer predições:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "pred = knn3.predict(ttfeatures)\n",
    "print(pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "E avaliá-lo usando score:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "knn3.score(ttfeatures,ttlabels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Agora vamos fazer o mesmo para um knn que utiliza pesos relativos a distâncias:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "wknn3 = KNeighborsClassifier(n_neighbors=3,weights='distance')\n",
    "wknn3.fit(trfeatures, trlabels)\n",
    "wknn3.score(ttfeatures,ttlabels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ou vamos mudar o k para 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "wknn1 = KNeighborsClassifier(n_neighbors=1,weights='uniform')\n",
    "wknn1.fit(trfeatures, trlabels)\n",
    "wknn1.score(ttfeatures,ttlabels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prática\n",
    "\n",
    "Verifique no intervalo de k = 1 a 10, qual o melhor valor de k e o melhor tipo de pesso a ser utilizado "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(\"K \\t Uniform \\t Distance\")\n",
    "for k in range(1,11):\n",
    "    UniformKnnClassifier = KNeighborsClassifier(n_neighbors=k,weights='uniform')\n",
    "    UniformKnnClassifier.fit(trfeatures, trlabels)\n",
    "    uScore = UniformKnnClassifier.score(ttfeatures, ttlabels)\n",
    "    \n",
    "    DistanceKnnClassifier = KNeighborsClassifier(n_neighbors=k,weights='distance')\n",
    "    DistanceKnnClassifier.fit(trfeatures, trlabels)\n",
    "    dScore = DistanceKnnClassifier.score(ttfeatures, ttlabels)\n",
    "    \n",
    "    print k,\"\\t{:f} \\t{:f}\".format(uScore,dScore) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
